Robust spoken dialogue systems for consumer products: a concrete application
نویسندگان
چکیده
In this paper, we report the significant results of a fully-implemented voice operated dialogue system, and particularly its main component: the Dialogue Manager (DM). Just like for other interfaces, spoken interfaces require a well-conducted design, implying a good analysis of the users’ needs throughout the dialogue. The VODIS project1 has led to the design and development of a spoken interface for the control of car equipment. Due to the workload caused by the task of driving the vehicle, spoken communication provides a potentially safe and efficient mode of operating the car equipment. To achieve this, we present the main characteristics of the task model specified during the design stage, and show how its specific features related to the spoken communication allowed to implement a robust dialogue. 1. GENERAL REQUIREMENTS To deal with the specificities of a spoken dialogue in the car, the activities within the project have been focused on four main tasks: a well-performing automatic speech recognition unit, the design of the interface, the integration of all system modules, and the task model, that serves as a backbone to model all relevant components of the interaction in the integrated system. Out of those, we will especially describe the two last ones, closely related to the content of the Dialogue Manager [1]. But at first, we provide here a summary of the outcomes of the two first significant activities, taking into account the specificities of the in-car environment [2]. 1.1. Signal processing for a well performing ASR The vocal interface has been designed to operate robustly in an acoustically adverse environment, i.e., it has been taken into account that the speech caught by the far-talk microphone (mounter on the ceiling of the vehicle) is potentially corrupted by other speech or music signals stemming from the car-audio system and ambient noise due to the tires, wind, other vehicles, etc. These distortions are tackled by two different signal-processing modules: the former by an acoustic echo canceller, the latter by a noise reduction scheme. The acoustic echo canceller operates on the signal coming directly from the far-talk microphone and the audio signal played by the loudspeakers. The electro-acoustic path loudspeaker-room-microphone is modelled by a 450 tap FIR-filter with coefficients adapted by an NLMS-algorithm enhanced by special measures to ensure fast convergence and stability in a noisy environment [3]. Thus robust voice operation is feasible 1. For more information, please refer to http://werner.ira.uka.de/VODIS, and to [6] even if the driver is listening to an audio source of considerable volume-level. This module has been implemented on a DSPboard plugged into the PC that hosts the entire vocal interface. The noise reduction module has been merged with the feature extractor of the speech recogniser, which is fed with the signal coming from the echo canceller. A spectral substraction scheme as discussed in [4] is applied to the MEL spectral coefficients. The noise power in each frequency band is estimated following the principle of the minimum statistics, i.e. observing the minima in a smoothed version of the power spectral density. 1.2. Interface design The design of the interface has primarily been influenced by the choice of speech as the main mode of human-machine interaction [5]. This choice is based on two motivations: • Since the driving task puts heavy demands on the users’ gestural channel (their hands), using speech to operate the system does not require additional use of that channel. • Though the number of functionalities available on the interface increases with the sophistication of car equipment components (the complete system used in VODIS offers more than 80 functionalities), the space available on the dashboard is generally very limited. So opting for a tactile interface would enforce to design a dialogue implying a large number of sub-menus, as well as a high constraint on the driver’s visual channel. To limit the constraints on the user’s visual channel, another important point is the amount of information to provide to the user, and the form to convey it. In the remainder of the paper, we will refer to conveyed information as feedback, although the definition of feedback itself is more specific than the global notion of providing information. As always, the dilemma between a spoken feedback, a visual feedback or a combined one had to be solved. On the one hand, spoken messages, via text-to-tpeech synthesis (TTS), are actually perceived by users without distracting their visual attention. But on the other hand, TTS messages are transient, so feedback is only accessible at the time the message is spoken. Furthermore, spoken messages are undoubtedly intrusive, while visual feedback is only accessed depending on users’ decision (when they actually look at the screen). To provide the user with the “right” feedback, every dialogue situation has been carefully studied, so as to define: 5th International Conference on Spoken Language Processing (ICSLP 98) Sydney, Australia November 30 -December 4, 1998 ISCA Archive http://www.isca-speech.org/archive
منابع مشابه
Robust and adaptive architecture for multilingual spoken dialogue systems
We present how robustness and adaptivity can be supported by the spoken dialogue system architecture. AthosMail is a multilingual spoken dialogue system for e-mail domain. It is being developed in the EU-funded DUMAS project. It has flexible system architecture supporting multiple components for input interpretation, dialogue management and output generation. In addition to language differences...
متن کاملToward Spoken Dialogue as Mutual Agreement
This paper re-envisions human-machine dialogue as a set of mutual agreements between a person and a computer. The intention is to provide the person with a habitable experience that accomplishes her goals, and to provide the computer with sufficient flexibility and intuition to support them. The application domain is particularly challenging: for its vocabulary size, for the number and variety ...
متن کاملError recovery for robust language understanding in spoken dialogue systems
In this paper, we proposed an example-based approach aiming at recovering ill-formed inputs to improve robustness of spoken dialogue systems. In this approach, a treebank, which contains example sentences and their correct parse trees, is used to provide clues for fixing the errors of ill-formed inputs. Particularly, the proposed error recovery method is suitable for spoken dialogue application...
متن کاملRobust and efficient semantic parsing of free word order languages in spoken dialogue systems
This paper presents a semantic parser for spoken dialogue systems. The parser is designed especially for the analysis of free word order languages by providing a feature called orderindependent matching. We describe how this feature allows writing of rules for free word order languages in an elegant way (using German as example language) and how it increases the robustness against speech recogn...
متن کاملStudies on Robust Language and Dialogue Processing for Spoken Dialogue Systems
In spoken dialogue systems, robust language processing for spontaneous speech understanding and robust dialogue processing for achieving user goal are inevitable. Previously, research of speech recognition and research of natural language understanding were done independently. At first glance, it seems to be no problem to combine these two technologies, because the purpose of speech recognition...
متن کاملDesigning a Portable Spoken Dialogue System
Spoken dialogue systems enable the construction of complex applications involving extended, meaningful interactions with users. Building an eeective, generic dialogue system requires techniques and expertise from a number of areas such as natural language, computer-human interaction, and information systems. A key challenge is to design a system through which user-friendly applications can be c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998